A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query

نویسندگان

  • Makoto Imamura
  • Yasuhiro Takayama
  • Nobuhiro Kaji
  • Masashi Toyoda
  • Masaru Kitsuregawa
چکیده

This paper proposes to solve the bottleneck of finding training data for word sense disambiguation (WSD) in the domain of web queries, where a complete set of ambiguous word senses are unknown. In this paper, we present a combination of active learning and semi-supervised learning method to treat the case when positive examples, which have an expected word sense in web search result, are only given. The novelty of our approach is to use “pseudo negative examples” with reliable confidence score estimated by a classifier trained with positive and unlabeled examples. We show experimentally that our proposed method achieves close enough WSD accuracy to the method with the manually prepared negative examples in several Japanese Web search data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-training and co-training in biomedical word sense disambiguation

Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, attempting to select the proper sense of ambiguous words. Due to the scarcity of training data, semi-supervised learning, which profits from seed annotated examples and a large set of unlabeled data, are worth researching. We present preliminary results of two semi-supervised learnin...

متن کامل

Learning model order from labeled and unlabeled data for partially supervised classification, with application to word sense disambiguation

Previous partially supervised classification methods can partition unlabeled data into positive examples and negative examples for a given class by learning from positive labeled examples and unlabeled examples, but they cannot further group the negative examples into meaningful clusters even if there are many different classes in the negative examples. Here we proposed an automatic method to o...

متن کامل

GIR at the NTCIR-12 Temporalia Task

The GIR team participated in the NTCIR 12 Temporal Information Access (Temporalia) Task. This report describes our approach to solving the Temporal Intent Disambiguation (TID) problem and discusses the official results. We explore the rich temporal information in the labeled and unlabeled search queries. A semi-supervised linear classifiers is then built up to predict the temporal classes for e...

متن کامل

Word Sense Disambiguation with Semi-Supervised Learning

Current word sense disambiguation (WSD) systems based on supervised learning are still limited in that they do not work well for all words in a language. One of the main reasons is the lack of sufficient training data. In this paper, we investigate the use of unlabeled training data for WSD, in the framework of semi-supervised learning. Four semisupervised learning algorithms are evaluated on 2...

متن کامل

Semi-supervised learning integrated with classifier combination for word sense disambiguation

Word sense disambiguation (WSD) is the problem of determining the right sense of a polysemous word in a certain context. This paper investigates the use of unlabeled data for WSD within a framework of semi-supervised learning, in which labeled data is iteratively extended from unlabeled data. Focusing on this approach, we first explicitly identify and analyze three problems inherently occurred ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009